The Howard University System Submission for the Shared Task in Language Identification in Spanish-English Codeswitching
نویسندگان
چکیده
This paper describes the Howard University system for the language identification shared task of the Second Workshop on Computational Approaches to Code Switching. Our system is based on prior work on SwahiliEnglish token-level language identification. Our system primarily uses character n-gram, prefix and suffix features, letter case and special character features along with previously existing tools. These are then combined with generated label probabilities of the immediate context of the token for the final system.
منابع مشابه
The CMU Submission for the Shared Task on Language Identification in Code-Switched Data
We describe the CMU submission for the 2014 shared task on language identification in code-switched data. We participated in all four language pairs: Spanish–English, Mandarin–English, Nepali–English, and Modern Standard Arabic–Arabic dialects. After describing our CRF-based baseline system, we discuss three extensions for learning from unlabeled data: semi-supervised learning, word embeddings,...
متن کاملColumbia-Jadavpur submission for EMNLP 2016 Code-Switching Workshop Shared Task: System description
We describe our present system for language identification as a part of the EMNLP 2016 Shared Task. We were provided with the Spanish-English corpus composed of tweets. We have employed a predictor-corrector algorithm to accomplish the goals of this shared task and analyzed the results obtained.
متن کاملThe CUED HiFST System for the WMT10 Translation Shared Task
This paper describes the Cambridge University Engineering Department submission to the Fifth Workshop on Statistical Machine Translation. We report results for the French-English and Spanish-English shared translation tasks in both directions. The CUED system is based on HiFST, a hierarchical phrase-based decoder implemented using weighted finite-state transducers. In the French-English task, w...
متن کاملCUED Submission for the WMT10 Translation Shared Task
This paper describes the Cambridge University Engineering Department (CUED) system for the ACL 2010 fifth workshop on statistical machine translation (WMT10). We participated in the FrenchEnglish and Spanish-English translation shared tasks in both directions. The CUED system is a hierarchical phrase-based system that uses finite-state transducers and lattice rescoring. In the French-English ta...
متن کاملPORTAGE: A Phrase-Based Machine Translation System
This paper describes the participation of the Portage team at NRC Canada in the shared task of ACL 2005 Workshop on Building and Using Parallel Texts. We discuss Portage, a statistical phrase-based machine translation system, and present experimental results on the four language pairs of the shared task. First, we focus on the French-English task using multiple resources and techniques. Then we...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2016